Estimating the 6D pose of known objects is important for robots to interactwith objects in the real world. The problem is challenging due to the varietyof objects as well as the complexity of the scene caused by clutter andocclusion between objects. In this work, we introduce a new ConvolutionalNeural Network (CNN) for 6D object pose estimation named PoseCNN. PoseCNNestimates the 3D translation of an object by localizing its center in the imageand predicting its distance from the camera. The 3D rotation of the object isestimated by regressing to a quaternion representation. PoseCNN is able tohandle symmetric objects and is also robust to occlusion between objects. Inaddition, we contribute a large scale video dataset for 6D object poseestimation named the YCB-Video dataset. Our dataset provides accurate 6D posesof 21 objects from the YCB dataset observed in 92 videos with 133,827 frames.We conduct experiments on our YCB-Video dataset and the OccludedLINEMOD datasetto show that PoseCNN provides very good estimates using only color as input.
展开▼